Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

kirtanav98 · 2024-02-23T19:01:40Z

The SplitVariants task used to have some lines to switch columns 5 and 6 of the bed file output, which is read in downstream tasks of TrainRDGenotyping.GenotypePESR. This causes the TrainRDGenotyping.GenotypePESR to error out reporting.

Error: WARNING: Incorrect CNV type specified
1: stop("WARNING: Incorrect CNV type specified")
The python script splitvariants.py was modified to switch the columns to the appropriate order to be compatible with downstream analysis requirements.

epiercehoffman · 2024-02-29T16:11:18Z

src/sv-pipeline/04_variant_resolution/scripts/split_variants.py

-            # array and increments the counter for that array
+            line = line.strip('\n').split('\t')
+            line[4], line[5] = line[5], line[4]
+            SVTYPE_FIELD = 5


Instead of reassigning SVTYPE_FIELD here, you should either set SVTYPE_FIELD to 5 at the beginning or (my preference) move the code that swaps the fields to right before you append a new line to current_lines

This was addressed and the value was set to 5.

epiercehoffman · 2024-02-29T16:11:40Z

src/sv-pipeline/04_variant_resolution/scripts/split_variants.py

        'ins': {'condition': lambda line: bca and line[SVTYPE_FIELD] == 'INS'}
    }

    current_lines = {prefix: [] for prefix in condition_prefixes.keys()}
    current_counts = {prefix: 0 for prefix in condition_prefixes.keys()}
    current_suffixes = {prefix: 'a' for prefix in condition_prefixes.keys()}

-    # Open the bed file and process


Please keep the comments throughout the script to help document the code's functionality

More comments were added.

epiercehoffman · 2024-02-29T16:11:52Z

src/sv-pipeline/04_variant_resolution/scripts/split_variants.py

-            # Checks which condition and prefix the current line matches and appends it to the corresponding
-            # array and increments the counter for that array
+            line = line.strip('\n').split('\t')
+            line[4], line[5] = line[5], line[4]


This needs a comment explaining what it's doing

A comment was added.

kirtanav98 added 6 commits February 23, 2024 13:57

fixed order of columns in splitvariants.py

1bad1ae

fixed linting issues

2c97f58

fixed linting issues

3473745

fixed linting issues

f212a07

fixed linting issues

499ddc7

fixed linting issues

89e51c6

epiercehoffman marked this pull request as draft February 26, 2024 16:43

kirtanav98 added 3 commits February 28, 2024 20:38

fixed script to produce all output files and swap the last two columns

807682b

fixed linting issues

ffb7107

fixed linting issues

9bf706e

epiercehoffman reviewed Feb 29, 2024

View reviewed changes

Addressed review

ac21ab6

kirtanav98 closed this Mar 1, 2024

kirtanav98 deleted the kv_splitvariants_fix branch March 1, 2024 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

kirtanav98 commented Feb 23, 2024

epiercehoffman Feb 29, 2024

kirtanav98 Feb 29, 2024

epiercehoffman Feb 29, 2024

kirtanav98 Feb 29, 2024

epiercehoffman Feb 29, 2024

kirtanav98 Feb 29, 2024

Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

Fix SplitVariants task in TasksGenotypeBatch.wdl to be compatible with downstream analysis #647

Conversation

kirtanav98 commented Feb 23, 2024

epiercehoffman Feb 29, 2024

Choose a reason for hiding this comment

kirtanav98 Feb 29, 2024

Choose a reason for hiding this comment

epiercehoffman Feb 29, 2024

Choose a reason for hiding this comment

kirtanav98 Feb 29, 2024

Choose a reason for hiding this comment

epiercehoffman Feb 29, 2024

Choose a reason for hiding this comment

kirtanav98 Feb 29, 2024

Choose a reason for hiding this comment